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10 BACKGROUND OF THE INVENTION 

Field of the Invention 

The invention relates generally to the field of branch prediction. More 
specifically, the invention relates to the use of a Speculative Branch Target Buffer 
1 5 (SBTB) to maintain speculative branch data for in-flight branches. 



Description of the Related Art 

Early microprocessors generally processed instructions one at a time. Each 
instruction was processed using separate sequential stages (e.g., instruction fetch, 
20 instruction decode, execute, and result writeback). Within such microprocessors different 
dedicated logic blocks performed each different processing stage. Each logic block 
waited until all the previous logic blocks completed operations before beginning its 
operation 

To improve efficiency, microprocessor designers overlapped the operations of the 
25 logic blocks for the instruction processing stages such that the microprocessor operated 
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on several instructions simultaneously. In operation, the logic blocks and hence the 
corresponding instruction processing stages concurrently process different instructions. 
At each clock tick, the result of each processing stage is passed to the subsequent 
processing stage. Microprocessors that use the technique of overlapping instruction 
5 processing stages are known as "pipelined" microprocessors. Some microprocessors 
further divide each processing stage into substages for additional performance 
improvement. Such processors are referred to as "deeply pipelined" microprocessors. 

An example of a simplified instruction pipeline 100 is shown in Figure 1. 
According to this simplified example, the instruction pipeline 100 comprises five major 

10 stages 105-130. The five major stages are the fetch stage 105, the decode stage 110, the 
dispatch stage 1 1 5, the execute stage 120, and the writeback stage (also referred to as the 
retirement stage) 125. Briefly, during the first stage, the fetch stage 105, one or more 
instructions are retrieved from memory, and subsequently decoded into micro-ops during 
the decode stage 1 10. Then, the micro-ops are dispatched to the appropriate execution 

15 unit for execution during the dispatch stage 1 15 and execution takes place during the 
execute stage 120. Finally, as the micro-ops complete execution, they are marked as 
being ready for retirement and are subsequently retired (e.g., their results are committed 
to the architectural registers) during the retirement stage 125. Consequently, the fetch 
unit (not shown) at the head of the pipeline provides the pipeline with a continuous flow 

20 of instructions, hence keeping the microprocessor busy. The fetch unit keeps the constant 
flow of instructions so the microprocessor does not have to stop its execution to fetch an 
instruction from memory. Such fetching guarantees continuous execution, as long as the 
instructions are stored in order of execution. However, due to branch instructions, such 
as conditional branch instructions included in software loops or conditional jumps, 
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instructions encountered by the fetch unit are not always presented in a sequence 
corresponding to the order of execution. Thus, branch instructions can cause pipelined 
microprocessors to speculatively execute down the wrong path such that the 
microprocessor must later flush the speculatively executed instructions and restart at a 
5 corrected address. 

As a result, many pipelined microprocessors employ branch prediction techniques 
to predict the outcome of branch instructions (e.g., determine which instruction to fetch 
next). Generally speaking, branch prediction seeks to guess whether or not a branch 
encountered in the instruction stream will be taken or not; and to fetch executable code 

1 0 from the appropriate location in the instruction stream. When a branch instruction is 

executed, it and the branch target address (i.e., the address of the of the instruction to be 
executed if the branch is taken) are stored in a branch target buffer (BTB). This and other 
information is subsequently used to predict which way the instruction will branch the next 
time it is executed. Mispredicted branches still cause the instruction pipeline to stall 

1 5 while the incorrect sequence of instructions that have been fetched and have begun 

execution are flushed from the instruction pipeline. However, when the branch prediction 
is correct (as it is over 90 percent of the time), executing a branch does not cause a 
pipeline stall as the processor may fetch and begin executing the proper sequence of 
instructions in advance. 

20 An earlier branch target buffer cache implementation is illustrated in Figures 2 

and 3. The branch target buffer (BTB) 200 depicted in Figure 2 is a set-associative cache 
that stores information about branch instructions in 128 individual "lines" of branch 
information. Each line of branch information in the BTB 200 contains four branch 
entries that each contains information about a single branch instruction that the 
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microprocessor has previously executed (if the valid bit is set in the entry). Each line 
also includes a branch pattern table 221 and least recently replaced (LRR) bits 220. The 
branch pattern table 221 is used for predicting the outcome of conditional branch 
instructions in the line of branch entries. The LLR bits 220 are used by the branch 
5 prediction circuit to select a branch entry in the line when information about a new branch 
will be written into the line of branch entries. 

Figure 3 illustrates the branch information stored within each branch entry of the 
BTB 200. As illustrated in Figure 3, each branch entry contains a tag field 3 10, a block 
offset field 320, a branch type field 330, a true history field 340, a speculative history 

10 field 350, a history selection bit 370, a valid bit 380, and a branch target address field 

390. The tag address 3 10 and the block offset 320 are used to identify a memory address 
of the branch instruction associated with the branch entry. The branch type field 330 
specifies what type of branch instruction the branch entry identifies (e.g., conditional 
branch, return from subroutine, call subroutine, unconditional branch). The true history 

1 5 field 340 maintains the actual (fully-resolved) taken or not-taken history of the branch 
instruction for a predetermined number of prior executions. The speculative history field 
350 maintains the "speculative" taken or not-taken history of the branch instruction for 
the predetermined number of prior executions. The history selection bit 370 indicates 
which of the true history field 340 or the speculative history field will be used to index 

20 into a pattern state table when calculating a branch prediction. The valid bit 380 indicates 
whether or not the branch entry contains valid branch information. The valid bit 380 is 
typically set during the execute or retirement stage when the branch prediction circuit 
allocates and fills the corresponding branch entry. The valid bit 380 is cleared when the 
branch entry is subsequently deallocated by the branch prediction circuit. 
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Because many of the fields (e.g., tag 310 9 valid 380, block offset 320, LRR 220, 
pattern table 221, true history 340, and speculative history 350) of the BTB 200 must be 
accessed by various pipeline stages the BTB 200 must include multiple ports for 
reading/writing the appropriate fields at prediction time and reading/writing the 
5 appropriate fields during allocation, update, and deallocation of branch entries. 

In such a prior BTB 200, branch entries are typically allocated at execute or retire 
time to avoid allocating entries along a mispredicted path. This, however, results in 
mispredicting tight loops until they are allocated. For deallocation, two consecutive lines 
of instruction are deallocated when a bogus branch is encountered, resulting in 
1 0 deallocation of good branches. Finally, branches are typically updated at execute time 
instead of retirement to improve prediction. This, however, often results in corruption 
since not all executed branches retire. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
The appended claims set forth the features of the invention with particularity. The 
invention, together with its advantages, may be best understood from the following 
detailed description taken in conjunction with the accompanying drawings of which: 
5 Figure 1 illustrates a simplified instruction pipeline. 

Figure 2 illustrates a prior art branch target buffer (BTB) implementation. 
Figure 3 illustrates branch information stored within each branch entry of the BTB 
of Figure 2. 

Figure 4A is a block diagram of a computer system in which one embodiment of 
1 0 the present invention may be implemented. 

Figure 4B is a simplified block diagram of various microprocessor units that may 
interact with the branch prediction circuit of the present invention. 

Figure 5 is a simplified block diagram of a branch prediction circuit according to 
one embodiment of the present invention. 
1 5 Figure 6 is a flow diagram illustrating branch entry processing according to one 

embodiment of the present invention. 
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DETAILED DESCRIPTION 
A method and apparatus are described for improving the performance of branch 
prediction using a combination of speculative branch target buffer and architectural 
branch target buffer. According to one embodiment, a branch target buffer includes both 

5 a speculative branch target buffer (SBTB) and an architectural branch target buffer 

(ABTB). The SBTB may be implemented as a relatively small structure that supports the 
ABTB, and that can be used to maintain speculative branch data for in-flight branches 
(Le. 5 those that have been fetched but not yet retired). Thus, the ABTB need only store 
the architectural or actual branch data. The combination of ABTB and SBTB described 

1 0 herein seek to improve the cost and performance of branch prediction, which essentially 
lowers cost and improves performance of a microprocessor. 

According to one embodiment, the SBTB allows the speculative history and the 
selection bit to be eliminated from the ABTB, and allows the ABTB to be single-ported, 
saving area that can be traded for performance. As will be described further below, 

1 5 branches can be allocated speculatively in SBTB at the time of decode, helping avoid 

misprediction in tight loop branches. Bogus branches are also deallocated at decode time. 
They are deallocated in the line containing the branch, and the next line only if it is a 
consecutive line thereby eliminating unnecessary deallocation. 

The branch entry is updated speculatively at prediction time, and corrected at 

20 execution time in the SBTB, thereby reducing the number of ABTB accesses. Further, 
the branches may be updated in the ABTB only after the last of the branches in the line 
retire to reduce update traffic to the ABTB. Both of these make a single-ported ABTB 
possible. Finally, there is no corruption of branch data as a result of mispredicted 
branches because the update is at retire time. 



Docket No.: 042390.P5563 

Express Mail No.: EL591668250US 7 



According to one embodiment of the present invention, the method and apparatus 
consist of a SBTB having all entries searched in parallel to determine whether the set 
matches against a fetch instruction pointer (IP). The SBTB, a FIFO or circular buffer, 
allocates an entry when an instruction line containing a conditional branch is fetched and 

5 decoded, and deallocates it when the last branch in the line retires. The novel branch 
prediction is made based on the youngest (e.g., the most recently allocated or updated) of 
the entries in the ABTB or the SBTB. Branch allocation/deallocation is done at branch 
decode time on the SBTB, leaving the ABTB untouched. Speculative prediction is 
continuously made, assuming it is correct, for subsequent processing, until an actual entry 

10 is made in the architectural history. Further, any mispredicted entries are corrected at 
execution time on the SBTB, and branch update is done on the ABTB at retirement. The 
method is designed to reduce the cost of branch prediction and increase its performance. 
Hence, producing an efficient, yet affordable, microprocessor. 

In the following description, for the purposes of explanation, numerous specific 

1 5 details are set forth in order to provide a thorough understanding of the present invention. 
It will be apparent, however, to one skilled in the art that the present invention may be 
practiced without some of these specific details. In other instances, well-known 
structures and devices are shown in block diagram form. 

Importantly, the method and apparatus of the present invention conceptually 

20 operate at a layer above branch prediction. Therefore, while embodiments of the present 
invention will be described with reference to branch prediction algorithms employing 
pattern tables, the method and apparatus described herein are equally applicable to other 
branch prediction techniques, such as the Yeh algorithm (See Tse Yu Yeh and Yale N. 
Patt, "Two-Level Adaptive Branch Prediction," The 24 th ACM/IEEE International 
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Symposium and Workshop on Microarchitecture, November 1991, pp. 51-61), and other 
static and dynamic branch prediction mechanisms. 

Computer System Overview 

5 Figure 4 A illustrates a computer system 400 representing an exemplary target 

system in which features of the present invention may be implemented. Computer system 
400 comprises a bus or other communication means 401 for communicating information, 
and a processing means such as processor 402 coupled with bus 401 for processing 
information. The processor 402 comprises a novel branch prediction circuit 403 that will 

1 0 be described further below. 

Computer system 400 further comprises a random access memory (RAM) or other 
dynamic storage device 404 (referred to as main memory), coupled to bus 401 for storing 
information and instructions to be executed by processor 402. Main memory 404 also 
may be used for storing temporary variables or other intermediate information during 

1 5 execution of instructions by processor 402. Computer system 400 also comprises a read 
only memory (ROM) and/or other static storage device 406 coupled to bus 401 for storing 
static information and instructions for processor 402. 

A data storage device 407 such as a magnetic disk or optical disc and its 
corresponding drive may also be coupled to computer system 400 for storing information 

20 and instructions. Computer system 400 can also be coupled via bus 401 to a display device 
421, such as a cathode ray tube (CRT) or Liquid Crystal Display (LCD), for displaying 
information to an end user. For example, graphical and/or textual indications of installation 
status, time remaining in the trial period, and other information may be presented to the 
prospective purchaser on the display device 421 . Typically, an alphanumeric input device 
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422, including alphanumeric and other keys, may be coupled to bus 401 for communicating 
information and/or command selections to processor 402. Another type of user input 
device is cursor control 423, such as a mouse, a trackball, or cursor direction keys for 
communicating direction information and command selections to processor 402 and for 

5 controlling cursor movement on display 42 1 . 

A communication device 425 is also coupled to bus 401 . The communication 
device 425 may include a modem, a network interface card, or other well-known interface 
devices, such as those used for coupling to Ethernet, token ring, or other types of physical 
attachment for purposes of providing a communication link to support a local or wide 

1 0 area network, for example. In any event, in this manner, the computer system 400 may be 
coupled to a number of clients and/or servers via a conventional network infrastructure, 
such as a company's Intranet and/or the Internet, for example. 

It is appreciated that a lesser or more equipped computer system than the example 
described above may be desirable for certain implementations. Therefore, the 

1 5 configuration of computer system 400 will vary from implementation to implementation 
depending upon numerous factors, such as price constraints, performance requirements, 
technological improvements, and/or other circumstances. 

It should be noted that, while the steps described herein may be performed under 
the control of a programmed processor, such as processor 402, in alternative 

20 embodiments, the steps may be fully or partially implemented by any programmable or 
hardcoded logic, such as Field Programmable Gate Arrays (FPGAs), TTL logic, or 
Application Specific Integrated Circuits (ASICs), for example. Additionally, the method 
of the present invention may be performed by any combination of programmed general 
purpose computer components and/or custom hardware components. Therefore, nothing 
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disclosed herein should be construed as limiting the present invention to a particular 
embodiment wherein the recited steps are performed by a specific combination of 
hardware components. 

Figure 4B is a simplified block diagram of processor 402 including various units 
5 that may interact with the branch prediction circuit of the present invention. In this 
example, the processor 402 includes a fetch unit 410, a branch prediction circuit 420, a 
decode unit 430, an execution unit 440, a retirement unit 450, and a cache 460. The fetch 
unit 410 retrieves instructions from cache and uses the instruction pointer (IP) to 
continuously fetch based on the signals received from the branch prediction circuit 420. 

1 0 In this example, the branch prediction circuit 420 includes a novel branch target buffer 
(BTB) 470 comprising a speculative branch target buffer (SBTB) 490 and an architectural 
branch target buffer 480. The branch prediction circuit 420 identifies branches and 
predicts whether or they will be taken based upon branch data contained in the SBTB 490 
and the ABTB 480 as will be described further below. The SBTB 490 includes a 

1 5 plurality of branch entries (not shown) to maintain speculative branch data associated 
with in-flight branches thereby reducing the burden on the ABTB 480 and allowing the 
ABTB 480 to track only architectural branch data, such as the actual (fully-resolved) 
taken/not-taken history associated with retired conditional branches. 

Returning to the fetch unit 410, the fetching process of the fetch unit 410 is 

20 interrupted if a branch is encountered, because the next instruction following the branch 
needs to be resolved before further instructions can be fetched. The branch prediction 
circuit 420 predicts the target address of the branch instruction based upon whether or not 
the branch is predicted as taken. The branch prediction circuit 420 provides this address 
to the fetch unit 410 to allow the fetch unit 410 to continue fetching instruction data. 



Docket No.: 042390.P5563 

Express Mail No.: EL591668250US 1 1 



The predicted target address is forwarded to the decode unit 430. The decode unit 
430 verifies each branch prediction and decodes each branch instruction. While verifying 
the results of the branch prediction, the decode unit 503 may deallocate any bogus 
branches that it detects. A bogus branch is one predicted by the branch prediction circuit 
5 420 at a location where no branch instructions exist. 

The execution unit 440 then executes the branch instruction. The execution unit 
440 compares the predicted branch target with the actual branch target, and hence may 
determine whether the branch was correctly predicted. The execution unit 440 may 
corrects any mispredicted branches or mispredicted targets by flushing the head of the 
10 pipeline and updating the corresponding branch entry in the SBTB 490. 

Finally, the retirement unit 450 retires each branch instruction. According to one 
embodiment, branch data may be updated at this point by stalling the prediction pipeline 
and writing back a line to the ABTB 480 when the last branch in the line retires. By 
updating branch data only when the last of the branches in the line has retired, update 
15 traffic to the ABTB 480 is reduced thus making it possible to implement the ABTB 480 
as a single-ported cache. Additionally, branch updating during retirement eliminates 
BTB corruption that may result from prior art update mechanisms that attempt to update 
the BTB at execution time. While such update mechanisms may improve prediction, 
corruption of the BTB may result since not all executed branches actually retire. 

20 

Branch Prediction Circuit 

Figure 5 is a simplified block diagram of a branch prediction circuit 500 according 
to one embodiment of the present invention. In the embodiment depicted, the branch 
prediction circuit 500 includes an architectural branch target buffer (ABTB) 510, a 
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speculative branch target buffer (SBTB) 520, and selection logic 530. According to one 
embodiment, the SBTB 520 is a relatively small structure supporting the ABTB 510. The 
SBTB 520 is used to maintain the speculative branch data for in-flight branches, meaning 
fetched branches that are not yet retired. In the embodiment depicted, the SBTB includes 

5 an N stage FIFO, where N is the number of stages in the processor's instruction pipeline 
and a branch allocation register 523. Each stage of the FIFO includes per-line fields 521 
and per-way fields 522. 

Per-line fields 521 include a set field, a pattern table, least recently replaced 
(LRR) bits, a BAR index, and a sequential set indication. The set field identifies the set 

10 number. In this manner, all entries of the SBTB 520 may search in parallel to see 
whether the set matches the IP. The pattern table is typically updated at retirement. 
However, it may be updated at prediction if deemed worthwhile for prediction accuracy. 
The LRR bits point to the entry to be replaced if necessary. Preferably, entries outside the 
line, or outside the execution path are selected if possible. The BAR indication indicates 

15 the branch allocation register used for allocation or that there is no allocation. If there is 
an allocation, the LRR bits indicate the entry being replaced. This is used for any 
subsequent predictions. The sequential set indication indicates whether the next set is a 
sequential set. This is used to deallocate entries in the next set in the case of a bogus 
branch. 

20 Per-way fields 522 include a valid indication, an order field, a speculative bit, 

history information, and a prediction field. The valid indication indicates whether or not 
the branch is valid. This bit is set on allocation and cleared on deallocation. The order 
field indicates the order of the branch offsets from lowest to highest. The speculative bit 
indicates that the branch was speculatively updated. This bit is cleared when updated at 
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retirement. It is also used to deallocate the line when the last branch is updated. History 
information contains the latest history copies from the ABTB or the SBTB. This allows 
the pattern table to be updated at retirement. Finally, the prediction bit represents the 
prediction. The prediction bit is concatenated with the last 3 history bits to form the 
5 history to be used for the next prediction. 

Branch allocation registers each include an indication of the type of branch being 
allocated, the tag of the branch, the offset of the branch, and history to be initialized based 
upon the type. 

Because the SBTB 520 is read/written during the decode stage for allocation of 
10 branch entries, during the execution stage for speculative update of branch entries, and 
during the retirement stage to correct branch entries, it is preferable to implement the 
SBTB 520 as a dual-ported memory. 

The ABTB 5 1 0 need only be read during branch prediction and written when 
branches in the SBTB 520 have retired. Consequently, the ABTB 510 may be 
1 5 implemented with a single read port and a single write port. Alternatively, the ABTB 5 1 0 
may be implemented as a single-ported memory in which reading and writing occur over 
the same shared port. 

Selection logic 530 selects between the ABTB output and the SBTB output, 
depending upon which one of the two contains the youngest entry. 

20 

Branch Entry Processing 

Figure 6is a flow diagram illustrating branch entry processing according to one 
embodiment of the present invention. When no entry is found in the ABTB 601, an entry 
is allocated in the SBTB at decode time 603. In case of a bogus branch 604, deallocation 
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is performed at decode time 605, else the branch is predicted speculatively 606. The 
speculative prediction continuous 606, assuming the prediction is correct, for the 
subsequent entries until an actual entry in found in the ABTB 607. Once there is an 
actual entry in the ABTB, any corresponding entry in the SBTB is decoded in order to 
5 avoid duplication 608. Any mispredicted branches and mispredicted targets are corrected 
at execute time, and entries are later executed 610. Finally, the executed branch 
instructions are retired 611. The branch history and PT are updated, but only branches 
that actually retire update the ABTB. Since not all executed branches retire, branch 
update at Retire time eliminates corruption. 

10 

In the foregoing specification, the invention has been described with reference to 
specific embodiments thereof. It will, however, be evident that various modifications and 
changes may be made thereto without departing from the broader spirit and scope of the 
invention. The specification and drawings are, accordingly, to be regarded in an 
1 5 illustrative rather than a restrictive sense. 



Docket No.: 042390.P5563 

Express Mail No.: EL591668250US 15 



CLAIMS 



What is claimed is 



1 1 . A method comprising maintaining speculative branch data for in-flight branches 

2 in a speculative branch target buffer (SBTB) cache by speculatively allocating a 

3 branch entry in a line of the SBTB after decoding an instruction containing a 

4 branch, speculatively updating branch data associated with the branch entry after 

5 branch prediction has been completed for the branch, and correcting the branch 

6 data after the branch has been executed. 

1 2. The method of claim 1 , wherein the branch data includes a speculative history 

2 field representing the speculative taken or not-taken history of the branch for a 

3 predetermined window of executions of the branch, and wherein said 

4 speculatively updating branch data comprises updating the speculative history 

5 field to reflect the taken or not-taken status of its most recent execution. 

1 3 . The method of claim 1 , wherein the line has a corresponding a pattern table, and 

2 wherein said speculatively updating branch data comprises updating the pattern 

3 table. 

1 4. The method of claim 1 , wherein the branch comprises a conditional branch. 

1 5. The method of claim 1 , wherein the branch comprises a return from a subroutine. 

1 6. The method of claim 1, wherein the branch comprises a call to a subroutine. 

1 7. The method of claim 1 , wherein the branch comprises an unconditional branch. 
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1 8. A method comprising: 

2 speculatively allocating a first branch entry for a conditional branch in a 

3 speculative branch target buffer (SBTB) prior to execution of the 

4 conditional branch responsive to decoding the conditional branch and 

5 finding no branch entry in an architectural branch target buffer (ABTB) 

6 corresponding to the conditional branch; 

7 speculatively allocating a second branch entry for the conditional branch in a the 

8 SBTB responsive to a subsequent failed attempt to locate a branch entry in 

9 the ABTB corresponding to the conditional branch; 

1 0 allocating a third branch entry for the conditional branch in the ABTB after 

1 1 retirement of the conditional branch; and 

12 subsequently performing branch prediction for the conditional branch by 

13 determining a predicted target address branch based upon branch data 

1 4 associated with the second branch entry. 

1 9. The method of claim 8, further comprising speculatively updating branch data 

2 associated with the first branch entry after said performing branch prediction for 

3 the conditional branch. 
1 
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1 10. A branch prediction circuit comprising: 

2 speculative branch target buffer (SBTB) means for maintaining speculative branch 

3 data associated with in-flight branches; 

4 architectural branch target buffer (ABTB) means, coupled to the SBTB means, for 

5 maintaining architectural branch data for branches corresponding to retired 

6 instructions; and 

7 target address generation means coupled to both the SBTB means and the ABTB 

8 means for determining a predicted target address based upon the 

9 speculative branch data and the architectural branch data. 

1 11. The branch prediction circuit of claim 1 0, wherein the SBTB means comprises a 

2 FIFO having entries corresponding to each of a plurality of pipeline stages of a 

3 microprocessor instruction pipeline. 

1 12. The branch prediction circuit of claim 10, wherein the SBTB means includes a 

2 single read port and a single write port. 

1 13. The branch prediction circuit of claim 1 0, wherein the SBTB means comprises a 

2 single-ported memory. 



Docket No.: 042390.P5563 

Express Mail No.: EL591668250US I % 



1 14. A machine-readable medium having stored thereon data representing sequences of 

2 instructions, the sequences of instructions which, when executed by a processor, 

3 cause the processor to: 

4 speculatively allocate a first branch entry for a conditional branch in a speculative 

5 branch target buffer (SBTB) prior to execution of the conditional branch 

6 responsive to decoding the conditional branch and finding no branch entry 

7 in an architectural branch target buffer (ABTB) corresponding to the 

8 conditional branch; 

9 speculatively allocate a second branch entry for the conditional branch in a the 

I o SBTB responsive to a subsequent failed attempt to locate a branch entry in 

I I the ABTB corresponding to the conditional branch; 

1 2 allocate a third branch entry for the conditional branch in the ABTB after 

13 retirement of the conditional branch; and 

14 subsequently perform branch prediction for the conditional branch by determining 

15 a predicted target address branch based upon branch data associated with 

1 6 the second branch entry. 

1 15. The machine-readable medium of claim 1 4, wherein the sequences of instructions 

2 further cause the processor to speculatively update branch data associated with the 

3 first branch entry after said performing branch prediction for the conditional 

4 branch. 
1 
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1 16. A branch prediction circuit comprising: 

2 a speculative branch target buffer (SBTB) cache having a plurality of branch 

3 entries to maintain speculative branch data associated with in-flight 

4 branches, the speculative branch data including a speculative history of 

5 taken/not-taken outcomes associated with the in-flight branches; and 

6 an architectural branch target buffer (ABTB) cache, coupled to the SBTB cache, 

7 the ABTB cache having a plurality of branch entries to maintain 

8 architectural branch data including the actual taken/not-taken outcomes 

9 associated with retired conditional branches. 

1 17. The branch prediction circuit of claim 16, wherein the SBTB cache comprises a 

2 FIFO having entries corresponding to each of a plurality of pipeline stages of a 

3 microprocessor instruction pipeline. 

1 1 8. The branch prediction circuit of claim 16, wherein the SBTB cache is dual-ported. 

1 19. The branch prediction circuit of claim 16, wherein the SBTB cache is single- 

2 ported. 

1 20. The branch prediction circuit of claim 1 6, wherein the ABTB cache is single- 

2 ported. 
1 
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1 21. A processor comprising: 

2 a fetch unit to speculatively retrieve instruction data for processing by an 

3 instruction pipeline; and 

4 a branch prediction circuit, coupled to the fetch unit, to predict final target 

5 addresses for branch instructions contained within the instruction data, the 

6 branch prediction circuit including 

7 a speculative branch target buffer (SBTB) cache having a plurality of 

8 branch entries to maintain speculative branch data associated with 

9 in-flight branches, the speculative branch data including a 

1 0 speculative history of taken/not-taken outcomes associated with the 

1 1 in-flight branches, and 

12 an architectural branch target buffer (ABTB) cache, coupled to the SBTB 

1 3 cache, the ABTB having a plurality of branch entries to maintain 

14 architectural branch data including the actual taken/not-taken 

1 5 outcomes associated with retired conditional branches. 

1 22. The processor of claim 2 1 , wherein the SBTB cache comprises a FIFO having 

2 entries corresponding to each of a plurality of pipeline stages of the instruction 

3 pipeline. 
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ABSTRACT OF THE DISCLOSURE 
A method and apparatus are provided for improving the performance of branch 
prediction using a combination of a speculative branch target buffer (SBTB) and an 
architectural branch target buffer (ABTB). According to one embodiment, speculative 

5 branch data is maintained for in-flight branches (i.e., those that have been fetched but not 
yet retired). A branch entry is speculatively allocated in a line of the SBTB after decoding 
an instruction containing a branch, such as a conditional branch, a return from a subroutine, 
a call to a subroutine, or an unconditional branch. Subsequently, the branch data associated 
with the branch entry is speculatively updated after branch prediction has been completed 

1 0 for the branch. Finally, the branch data is corrected after the branch has been executed. 
According to another embodiment, a novel branch prediction circuit includes both a 
speculative branch target buffer (SBTB) cache and an architectural branch target buffer 
(ABTB) cache. The SBTB cache contains multiple branch entries to maintain speculative 
branch data associated with in-flight branches. The speculative branch data includes a 

1 5 speculative history of taken/not-taken outcomes associated with the in-flight branches. The 
ABTB cache is coupled to the SBTB cache. The ABTB cache also includes multiple 
branch entries, however, they are for maintaining architectural branch data including the 
actual taken/not-taken outcomes associated with retired conditional branches. 
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of INTEL CORPORATION; and James R. Thein, Reg. No. 31 ,71 0, my patent attorney with full power of 
substitution and revocation, to prosecute this application and to transact all business in the Patent and 
Trademark Office connected herewith. 
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APPENDIX B 



Title 37, Code of Federal Regulations, Section 1.56 
Duty to Disclose Information Material to Patentability 

(a) A patent by its very nature is affected with a public interest. The public interest is best served, 
and the most effective patent examination occurs when, at the time an application is being examined, the 
Office is aware of and evaluates the teachings of ail information material to patentability. Each individual 
associated with the filing and prosecution of a patent application has a duty of candor and good faith in 
dealing with the Office, which includes a duty to disclose to the Office all information known to that individual 
to be material to patentability as defined in this section. The duty to disclosure information exists with respect 
to each pending claim until the claim is cancelled or withdrawn from consideration, or the application becomes 
abandoned. Information material to the patentability of a claim that is cancelled or withdrawn from 
consideration need not be submitted if the information is not material to the patentability of any claim 
remaining under consideration in the application. There is no duty to submit information which is not material 
to the patentability of any existing claim. The duty to disclosure all information known to be material to 
patentability is deemed to be satisfied if all information known to be material to patentability of any claim 
issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by §§1.97(b)-(d) 
and 1 .98. However, no patent will be granted on an application in connection with which fraud on the Office 
was practiced or attempted or the duty of disclosure was violated through bad faith or intentional misconduct. 
The Office encourages applicants to carefully examine: 

(1) Prior art cited in search reports of a foreign patent office in a counterpart application, and 

(2) The closest information over which individuals associated with the filing or prosecution of a 
patent application believe any pending claim patentably defines, to make sure that any material information 
contained therein is disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not cumulative to 
information already of record or being made or record in the application, and 

(1) It establishes, by itself or in combination with other information, a prima facie case of 
unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a conclusion that a claim is 
unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim 
its broadest reasonable construction consistent with the specification, and before any consideration is given to 
evidence which may be submitted in an attempt to establish a contrary conclusion of patentability. 

(c) Individuals associated with the filing or prosecution of a patent application within the 
meaning of this section are: 

(1) Each inventor named in the application; 

(2) Each attorney or agent who prepares or prosecutes the application; and 

(3) Every other person who is substantively involved in the preparation or prosecution of the 
application and who is associated with the inventor, with the assignee or with anyone to whom there is an 
obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this section by 
disclosing information to the attorney, agent, or inventor. 
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